Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Nature ; 2024 Apr 24.
Article in English | MEDLINE | ID: mdl-38658746

ABSTRACT

Angiosperms are the cornerstone of most terrestrial ecosystems and human livelihoods1,2. A robust understanding of angiosperm evolution is required to explain their rise to ecological dominance. So far, the angiosperm tree of life has been determined primarily by means of analyses of the plastid genome3,4. Many studies have drawn on this foundational work, such as classification and first insights into angiosperm diversification since their Mesozoic origins5-7. However, the limited and biased sampling of both taxa and genomes undermines confidence in the tree and its implications. Here, we build the tree of life for almost 8,000 (about 60%) angiosperm genera using a standardized set of 353 nuclear genes8. This 15-fold increase in genus-level sampling relative to comparable nuclear studies9 provides a critical test of earlier results and brings notable change to key groups, especially in rosids, while substantiating many previously predicted relationships. Scaling this tree to time using 200 fossils, we discovered that early angiosperm evolution was characterized by high gene tree conflict and explosive diversification, giving rise to more than 80% of extant angiosperm orders. Steady diversification ensued through the remaining Mesozoic Era until rates resurged in the Cenozoic Era, concurrent with decreasing global temperatures and tightly linked with gene tree conflict. Taken together, our extensive sampling combined with advanced phylogenomic methods shows the deep history and full complexity in the evolution of a megadiverse clade.

2.
Syst Biol ; 71(2): 301-319, 2022 02 10.
Article in English | MEDLINE | ID: mdl-33983440

ABSTRACT

The tree of life is the fundamental biological roadmap for navigating the evolution and properties of life on Earth, and yet remains largely unknown. Even angiosperms (flowering plants) are fraught with data gaps, despite their critical role in sustaining terrestrial life. Today, high-throughput sequencing promises to significantly deepen our understanding of evolutionary relationships. Here, we describe a comprehensive phylogenomic platform for exploring the angiosperm tree of life, comprising a set of open tools and data based on the 353 nuclear genes targeted by the universal Angiosperms353 sequence capture probes. The primary goals of this article are to (i) document our methods, (ii) describe our first data release, and (iii) present a novel open data portal, the Kew Tree of Life Explorer (https://treeoflife.kew.org). We aim to generate novel target sequence capture data for all genera of flowering plants, exploiting natural history collections such as herbarium specimens, and augment it with mined public data. Our first data release, described here, is the most extensive nuclear phylogenomic data set for angiosperms to date, comprising 3099 samples validated by DNA barcode and phylogenetic tests, representing all 64 orders, 404 families (96$\%$) and 2333 genera (17$\%$). A "first pass" angiosperm tree of life was inferred from the data, which totaled 824,878 sequences, 489,086,049 base pairs, and 532,260 alignment columns, for interactive presentation in the Kew Tree of Life Explorer. This species tree was generated using methods that were rigorous, yet tractable at our scale of operation. Despite limitations pertaining to taxon and gene sampling, gene recovery, models of sequence evolution and paralogy, the tree strongly supports existing taxonomy, while challenging numerous hypothesized relationships among orders and placing many genera for the first time. The validated data set, species tree and all intermediates are openly accessible via the Kew Tree of Life Explorer and will be updated as further data become available. This major milestone toward a complete tree of life for all flowering plant species opens doors to a highly integrated future for angiosperm phylogenomics through the systematic sequencing of standardized nuclear markers. Our approach has the potential to serve as a much-needed bridge between the growing movement to sequence the genomes of all life on Earth and the vast phylogenomic potential of the world's natural history collections. [Angiosperms; Angiosperms353; genomics; herbariomics; museomics; nuclear phylogenomics; open access; target sequence capture; tree of life.].


Subject(s)
Magnoliopsida , Genomics , High-Throughput Nucleotide Sequencing , Humans , Magnoliopsida/genetics , Phylogeny
3.
Trends Plant Sci ; 24(10): 887-891, 2019 10.
Article in English | MEDLINE | ID: mdl-31477409

ABSTRACT

High-throughput DNA sequencing (HTS) presents great opportunities for plant systematics, yet genomic complexity needs to be reduced for HTS to be effectively applied. We highlight Hyb-Seq as a promising approach, especially in light of the recent development of probes enriching 353 low-copy nuclear genes from any flowering plant taxon.


Subject(s)
Magnoliopsida , Cell Nucleus , Genomics , High-Throughput Nucleotide Sequencing , Phylogeny , Sequence Analysis, DNA
4.
Syst Biol ; 68(4): 594-606, 2019 07 01.
Article in English | MEDLINE | ID: mdl-30535394

ABSTRACT

Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5-15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.


Subject(s)
DNA Probes , Magnoliopsida/genetics , Sequence Analysis, DNA/methods , Cluster Analysis
5.
Am J Bot ; 105(3): 614-622, 2018 03.
Article in English | MEDLINE | ID: mdl-29603138

ABSTRACT

Providing science and society with an integrated, up-to-date, high quality, open, reproducible and sustainable plant tree of life would be a huge service that is now coming within reach. However, synthesizing the growing body of DNA sequence data in the public domain and disseminating the trees to a diverse audience are often not straightforward due to numerous informatics barriers. While big synthetic plant phylogenies are being built, they remain static and become quickly outdated as new data are published and tree-building methods improve. Moreover, the body of existing phylogenetic evidence is hard to navigate and access for non-experts. We propose that our community of botanists, tree builders, and informaticians should converge on a modular framework for data integration and phylogenetic analysis, allowing easy collaboration, updating, data sourcing and flexible analyses. With support from major institutions, this pipeline should be re-run at regular intervals, storing trees and their metadata long-term. Providing the trees to a diverse global audience through user-friendly front ends and application development interfaces should also be a priority. Interactive interfaces could be used to solicit user feedback and thus improve data quality and to coordinate the generation of new data. We conclude by outlining a number of steps that we suggest the scientific community should take to achieve global phylogenetic synthesis.


Subject(s)
Information Dissemination , Information Management , Phylogeny , Plants/genetics , DNA, Plant , Humans , Information Technology , Sequence Analysis, DNA
6.
Front Plant Sci ; 7: 1540, 2016.
Article in English | MEDLINE | ID: mdl-27822218

ABSTRACT

The appropriate timing of developmental transitions is critical for adapting many crops to their local climatic conditions. Therefore, understanding the genetic basis of different aspects of phenology could be useful in highlighting mechanisms underpinning adaptation, with implications in breeding for climate change. For bread wheat (Triticum aestivum), the transition from vegetative to reproductive growth, the start and rate of leaf senescence and the relative timing of different stages of flowering and grain filling all contribute to plant performance. In this study we screened under Smart house conditions a large, multi-founder "NIAB elite MAGIC" wheat population, to evaluate the genetic elements that influence the timing of developmental stages in European elite varieties. This panel of recombinant inbred lines was derived from eight parents that are or recently have been grown commercially in the UK and Northern Europe. We undertook a detailed temporal phenotypic analysis under Smart house conditions of the population and its parents, to try to identify known or novel Quantitative Trait Loci associated with variation in the timing of key phenological stages in senescence. This analysis resulted in the detection of QTL interactions with novel traits such the time between "half of ear emergence above flag leaf ligule" and the onset of senescence at the flag leaf as well as traits associated with plant morphology such as stem height. In addition, strong correlations between several traits and the onset of senescence of the flag leaf were identified. This work establishes the value of systematically phenotyping genetically unstructured populations to reveal the genetic architecture underlying morphological variation in commercial wheat.

7.
J Gen Virol ; 97(5): 1145-1157, 2016 05.
Article in English | MEDLINE | ID: mdl-26763979

ABSTRACT

The process by which eukaryotic viruses with segmented genomes select a complete set of genome segments for packaging into progeny virus particles is not understood. In this study a model based on the association of genome segments through specific RNA-RNA interactions driven by base pairing was formalized and tested in the Orbivirus genus of the Reoviridae family. A strategy combining screening of the genomic sequences for inter-segment complementarity with direct functional testing of inter-segment RNA-RNA interactions using reverse genetics is described in the type species of the Orbivirus genus, Bluetongue virus (BTV). Two examples, involving four of the ten BTV genomic segments, of specific inter-segment interaction motifs whose maintenance is essential for the generation of infectious virus, were identified. Equivalent inter-segment complementarities were found between the identified regions of the orthologous genome segments of all orbiviruses, including phylogenetically distant species. Specific interaction of the participating RNA segments was confirmed in vitro using electrophoretic mobility shift assays, with the interactions inhibited using oligonucleotides complementary to the interaction motif of one of the interacting partners, and also through mutagenesis of the motifs. In each example, the base pairing rather than the absolute sequence was critical to the formation of a functional inter-segment interaction, with mutations only being tolerated in rescued virus if compensating changes were made in the interacting partner to restore uninterrupted base pairing. The absolute sequence of the complementarity motifs varied between species, indicating that this newly identified phenomenon may contribute to the observed lack of reassortment between Orbivirus species.


Subject(s)
Genome, Viral , Orbivirus/physiology , Base Sequence , Computational Biology , Nucleic Acid Conformation , RNA, Viral/physiology
8.
Infect Genet Evol ; 32: 440-8, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25861750

ABSTRACT

Full-genome sequences have been used to monitor the fine-scale dynamics of epidemics caused by RNA viruses. However, the ability of this approach to confidently reconstruct transmission trees is limited by the knowledge of the genetic diversity of viruses that exist within different epidemiological units. In order to address this question, this study investigated the variability of 45 foot-and-mouth disease virus (FMDV) genome sequences (from 33 animals) that were collected during 2007 from eight premises (10 different herds) in the United Kingdom. Bayesian and statistical parsimony analysis demonstrated that these sequences exhibited clustering which was consistent with a transmission scenario describing herd-to-herd spread of the virus. As an alternative to analysing all of the available samples in future epidemics, the impact of randomly selecting one sequence from each of these herds was used to assess cost-effective methods that might be used to infer transmission trees during FMD outbreaks. Using these approaches, 85% and 91% of the resulting topologies were either identical or differed by only one edge from a reference tree comprising all of the sequences generated within the outbreak. The sequence distances that accrued during sequential transmission events between epidemiological units was estimated to be 4.6 nucleotides, although the genetic variability between viruses recovered from chronic carrier animals was higher than between viruses from animals with acute-stage infection: an observation which poses challenges for the use of simple approaches to infer transmission trees. This study helps to develop strategies for sampling during FMD outbreaks, and provides data that will guide the development of further models to support control policies in the event of virus incursions into FMD free countries.


Subject(s)
Cattle Diseases/virology , Foot-and-Mouth Disease Virus/genetics , Foot-and-Mouth Disease/virology , Genetic Variation , Genome, Viral , Animals , Base Sequence , Bayes Theorem , Cattle , Cattle Diseases/epidemiology , Cattle Diseases/transmission , Cluster Analysis , Disease Outbreaks/veterinary , Foot-and-Mouth Disease/transmission , Foot-and-Mouth Disease Virus/classification , Foot-and-Mouth Disease Virus/isolation & purification , Molecular Sequence Data , United Kingdom/epidemiology
9.
Artif Life ; 18(4): 445-60, 2012.
Article in English | MEDLINE | ID: mdl-22938558

ABSTRACT

Plants are frequently wounded by mechanical impact or by insects, and their ability to adequately respond to wounding is essential for their survival and reproductive success. The wound response is mediated by a signal transduction and regulatory network. Molecular studies in Arabidopsis have identified the COI1 gene as a central component of this network. Current models of these networks qualitatively describe the wound response, but they are not directly assessed using quantitative gene expression data. We built a model comprising the key components of the Arabidopsis wound response using the transsys framework. For comparison, we constructed a null model that is devoid of any regulatory interactions, and various alternative models by rewiring the wound response model. All models were parametrized by computational optimization to generate synthetic gene expression profiles that approximate the empirical data set. We scored the fit of the synthetic to the empirical data with various distance measures, and used the median distance after optimization to directly and quantitatively assess the wound response model and its alternatives. Discrimination of candidate models depends substantially on the measure of gene expression profile distance. Using the null model to assess quality of the distance measures for discrimination, we identify correlation of log-ratio profiles as the most suitable distance. Our wound response model fits the empirical data significantly better than the alternative models. Gradual perturbation of the wound response model results in a corresponding gradual decline in fit. The optimization approach provides insights into biologically relevant features, such as robustness. It is a step toward enabling integrative studies of multiple cross-talking pathways, and thus may help to develop our understanding how the genome informs the mapping of environmental signals to phenotypic traits.


Subject(s)
Arabidopsis/genetics , Gene Regulatory Networks , Models, Biological , Arabidopsis/physiology , Computer Simulation , Gene Expression Profiling
10.
Algorithms Mol Biol ; 3: 3, 2008 Mar 31.
Article in English | MEDLINE | ID: mdl-18377655

ABSTRACT

BACKGROUND: Experimental identification of microRNA (miRNA) targets is a difficult and time consuming process. As a consequence several computational prediction methods have been devised in order to predict targets for follow up experimental validation. Current computational target prediction methods use only the miRNA sequence as input. With an increasing number of experimentally validated targets becoming available, utilising this additional information in the search for further targets may help to improve the specificity of computational methods for target site prediction. RESULTS: We introduce a generic target prediction method, the Stacking Binding Matrix (SBM) that uses both information about the miRNA as well as experimentally validated target sequences in the search for candidate target sequences. We demonstrate the utility of our method by applying it to both animal and plant data sets and compare it with miRanda, a commonly used target prediction method. CONCLUSION: We show that SBM can be applied to target prediction in both plants and animals and performs well in terms of sensitivity and specificity. Open source code implementing the SBM method, together with documentation and examples are freely available for download from the address in the Availability and Requirements section.

12.
J Bioinform Comput Biol ; 2(2): 289-307, 2004 Jun.
Article in English | MEDLINE | ID: mdl-15297983

ABSTRACT

Recognition of protein-DNA binding sites in genomic sequences is a crucial step for discovering biological functions of genomic sequences. Explosive growth in availability of sequence information has resulted in a demand for binding site detection methods with high specificity. The motivation of the work presented here is to address this demand by a systematic approach based on Maximum Likelihood Estimation. A general framework is developed in which a large class of binding site detection methods can be described in a uniform and consistent way. Protein-DNA binding is determined by binding energy, which is an approximately linear function within the space of sequence words. All matrix based binding word detectors can be regarded as different linear classifiers which attempt to estimate the linear separation implied by the binding energy function. The standard approaches of consensus sequences and profile matrices are described using this framework. A maximum likelihood approach for determining this linear separation leads to a novel matrix type, called the binding matrix. The binding matrix is the most specific matrix based classifier which is consistent with the input set of known binding words. It achieves significant improvements in specificity compared to other matrices. This is demonstrated using 95 sets of experimentally determined binding words provided by the TRANSFAC database.


Subject(s)
Algorithms , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , DNA/chemistry , DNA/genetics , Sequence Alignment/methods , Sequence Analysis/methods , Binding Sites/genetics , Likelihood Functions , Protein Binding/genetics , Transcription Factors/chemistry , Transcription Factors/genetics
13.
J Theor Biol ; 220(4): 529-44, 2003 Feb 21.
Article in English | MEDLINE | ID: mdl-12623284

ABSTRACT

Empirically, it has been observed in several cases that the information content of transcription factor binding site sequences (R(sequence)) approximately equals the information content of binding site positions (R(frequency)). A general framework for formal models of transcription factors and binding sites is developed to address this issue. Measures for information content in transcription factor binding sites are revisited and theoretic analyses are compared on this basis. These analyses do not lead to consistent results. A comparative review reveals that these inconsistent approaches do not include a transcription factor state space. Therefore, a state space for mathematically representing transcription factors with respect to their binding site recognition properties is introduced into the modelling framework. Analysis of the resulting comprehensive model shows that the structure of genome state space favours equality of R(sequence) and R(frequency) indeed, but the relation between the two information quantities also depends on the structure of the transcription factor state space. This might lead to significant deviations between R(sequence) and R(frequency). However, further investigation and biological arguments show that the effects of the structure of the transcription factor state space on the relation of R(sequence) and R(frequency) are strongly limited for systems which are autonomous in the sense that all DNA-binding proteins operating on the genome are encoded in the genome itself. This provides a theoretical explanation for the empirically observed equality.


Subject(s)
Computational Biology , Models, Genetic , Transcription Factors/metabolism , Animals , Binding Sites , Computer Simulation , Genome , Mutation
SELECTION OF CITATIONS
SEARCH DETAIL
...